llama: add initial support for Falcon-H1 model family #14534

ibrahimkhadraoui · 2025-07-04T14:21:39Z

Fixes: #13681
Summary
• Add initial support for the Falcon-H1 model family.
• Implements model loading, basic inference, and tokenizer integration for Falcon-H1.
• Updates the build scripts and relevant documentation for Falcon-H1 compatibility.

Details
• Adapted model architecture and layer mapping to match Falcon-H1.
• Integrated Falcon-H1 tokenizer with automatic fallback if tokenizer files are missing.
• Added new test cases to verify Falcon-H1 model loading and inference.
• Cleaned up redundant code from previous Falcon integration attempts.

Notes
• The Falcon-H1 integration follows the same approach as other model families (see llama and Mamba support).
• This supersedes #14238 with a cleaner and more modular implementation.
• Refer to the Falcon-H1 repo for model weights and tokenizer files.

This reverts commit 243e4d1.

…-public into add-fh1-rebased

jacekpoplawski · 2025-07-04T14:29:06Z

Is the old gguf still valid or new one must be generated?

ibrahimkhadraoui · 2025-07-04T14:33:02Z

@jacekpoplawski
You need to re convert with hf_to_gguf we did some changes on the data types

Co-authored-by: Sigbjørn Skjæret <[email protected]>

convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

younesbelkada · 2025-07-09T06:06:41Z

We just fixed GGUFs on HF after the comments from @compilade and @CISC and we can confirm everything is working well now across all model sizes

compilade · 2025-07-09T06:15:40Z

I'm starting to think it might be simpler to merge this before #7531, and then I could fix the conflicts caused by the hybrid graph input changes made there.

Otherwise you'd also have to deal with the conflicts in the tensor mappings and then my follow-up comments suggesting to use a common mamba2 layer builder (which is only possible with the hybrid graph input changes).

younesbelkada · 2025-07-09T06:16:44Z

That would be great, thank you very much @compilade !

younesbelkada · 2025-07-09T06:24:50Z

@compilade for the common mamba2 layer I noticed that currently the build_mamba2_layer for mamba2 assumes it always have the grouped RMS norm, we'll need to make sure to have the RMS norm being optional on the common builder to make it compatible with H1

src/llama-model.cpp

CISC · 2025-07-09T08:07:28Z

Damnit, missed some whitespace. :(

CISC · 2025-07-09T08:15:52Z

Damnit, missed some whitespace. :(

It's really weird, GitHub doesn't show it being added, and everything looked fine in diffs...

younesbelkada · 2025-07-09T08:19:22Z

Thank you very much @ibrahimkhadraoui @ggerganov @compilade @CISC @gabe-l-hart for all your help and effort for this integration and other upcoming integrations!

gabe-l-hart · 2025-07-09T13:49:09Z

src/llama-arch.cpp

@@ -1024,6 +1025,30 @@ static const std::map<llm_arch, std::map<llm_tensor, const char *>> LLM_TENSOR_N
            { LLM_TENSOR_SSM_OUT,         "blk.%d.ssm_out" },
        },
    },
+    {
+        LLM_ARCH_FALCON_H1,


One thing I've noticed while working through merge conflicts with GR4: it looks like the Falcon H1 entries in the various model architecture lists are inconsistent in their order (next to FALCON in one place and after ERNIE_4_5 in two places in constants.py, after MAMBA2 on the c++ side). Do we want to make this consistent everywhere? I suspect you'll hit this with your merge conflict resolution @compilade

* origin/master: ggml : prevent integer overflow in gguf tensor size calculation (ggml-org#14595) model : add skt/A.X-4.0 model vocabulary (ggml-org#14589) llama : remove unintended whitespace (ggml-org#14592) model : add support for Falcon-H1 family (ggml-org#14534) convert : fix smollm3 jinja template (ggml-org#14586)

* v1 * push more fixes * another fix * fix * more fixes * minor fix * more cleaning on python code * python fixes * changed precision for multipliers float 32->64 * fixes * another fix * fix * pre-norm -> norm * fix * Revert "fix" This reverts commit 243e4d1. * fix * small fix ffn_norm * try * mix instead of max * fix vocab size * conflict solve * fixed multipliers * falcon-h1 specefic vocab resolved * read arch from gguf.MODEL_ARCH * mamba_d_ssm added to d_inner find_hparam * remove unused functions from gguf_writer.py * override modify_tensors instead of get_tensors * fix conversion and d_inner * added some cb functions for debugging puposes * inp_out_ids moved outside of layers loop * mup_vec create as float64 * fix rope_theta * injected mup * clean ups * rm extra space * rm unused MAMBA_CHUNK_SIZE * rm unused key * add bos False * changed ROPE_TYPE * cleaning debugging stuff * cleaning debug quant * fix comment * some cleanups * some cleanups * Update src/llama-model-loader.cpp * more cleanups * moe cleanuips * d_ssm -> d_inner; * cleaning unused hparams * cleanup * more cleanups * more cleanups on python conversion; * minor cleanups * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * remove todo * added falcon-h1 * tensor not required * clean * remove unneeded attributes * more cleanups and fixed conversion * remove final_norm * flake8 fixes * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * flake8 fixes * Update src/llama-hparams.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-arch.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <[email protected]> * added hashes * Update src/llama-arch.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-vocab.cpp Co-authored-by: Georgi Gerganov <[email protected]> * update the update file * Revert "update the update file" This reverts commit 082ab4a. * fix: address suggestions * fix: update convert_hf_to_gguf.py * Update gguf-py/gguf/constants.py Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model-loader.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * d_inner fixed * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * reshaping ssm_norm for 34B * removing generate_mup * remove duplicates metadata keys * rm comment * final comment * fix unused args * fix constants * fix bad merge * Update src/llama-model.cpp Co-authored-by: compilade <[email protected]> * falcon-h1: remove unused ssm_in_b and bad merge * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * falcon-h1: fix last comment * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * falcon-h1: revert add_add_bos(False) * falcon-h1: fix tied weights * falcon-h1: remove whitespace * falcon-h1: fix wrong size param * falcon-h1: fix whitespace issues --------- Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Younes B <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: compilade <[email protected]> Signed-off-by: ryan-mangeno <[email protected]>

* v1 * push more fixes * another fix * fix * more fixes * minor fix * more cleaning on python code * python fixes * changed precision for multipliers float 32->64 * fixes * another fix * fix * pre-norm -> norm * fix * Revert "fix" This reverts commit 243e4d1. * fix * small fix ffn_norm * try * mix instead of max * fix vocab size * conflict solve * fixed multipliers * falcon-h1 specefic vocab resolved * read arch from gguf.MODEL_ARCH * mamba_d_ssm added to d_inner find_hparam * remove unused functions from gguf_writer.py * override modify_tensors instead of get_tensors * fix conversion and d_inner * added some cb functions for debugging puposes * inp_out_ids moved outside of layers loop * mup_vec create as float64 * fix rope_theta * injected mup * clean ups * rm extra space * rm unused MAMBA_CHUNK_SIZE * rm unused key * add bos False * changed ROPE_TYPE * cleaning debugging stuff * cleaning debug quant * fix comment * some cleanups * some cleanups * Update src/llama-model-loader.cpp * more cleanups * moe cleanuips * d_ssm -> d_inner; * cleaning unused hparams * cleanup * more cleanups * more cleanups on python conversion; * minor cleanups * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * remove todo * added falcon-h1 * tensor not required * clean * remove unneeded attributes * more cleanups and fixed conversion * remove final_norm * flake8 fixes * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * flake8 fixes * Update src/llama-hparams.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-arch.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <[email protected]> * added hashes * Update src/llama-arch.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update src/llama-vocab.cpp Co-authored-by: Georgi Gerganov <[email protected]> * update the update file * Revert "update the update file" This reverts commit 082ab4a. * fix: address suggestions * fix: update convert_hf_to_gguf.py * Update gguf-py/gguf/constants.py Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model-loader.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * d_inner fixed * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * reshaping ssm_norm for 34B * removing generate_mup * remove duplicates metadata keys * rm comment * final comment * fix unused args * fix constants * fix bad merge * Update src/llama-model.cpp Co-authored-by: compilade <[email protected]> * falcon-h1: remove unused ssm_in_b and bad merge * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * falcon-h1: fix last comment * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * falcon-h1: revert add_add_bos(False) * falcon-h1: fix tied weights * falcon-h1: remove whitespace * falcon-h1: fix wrong size param * falcon-h1: fix whitespace issues --------- Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Younes B <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: compilade <[email protected]>

younesbelkada and others added 26 commits July 3, 2025 14:49

v1

991de6c

push more fixes

f897efd

another fix

71a6848

fix

03568c9

more fixes

0c93ef6

minor fix

fdd5cff

more cleaning on python code

14c37ec

python fixes

8bea922

changed precision for multipliers float 32->64

071f4b7

fixes

50eadc7

merge

a39a842

another fix

1415cd8

fix

243e4d1

pre-norm -> norm

cce3549

fix

22de62c

Revert "fix"

2fe057c

This reverts commit 243e4d1.

Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…

d22b4ea

…-public into add-fh1-rebased

fix

6c7d9e2

small fix ffn_norm

15138df

Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…

a6d0067

…-public into add-fh1-rebased

try

1fd0574

mix instead of max

250b4f1

fix vocab size

3ee7983

Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…

2aa48dd

…-public into add-fh1-rebased

conflict solve

9760c8b

fixed multipliers

7a25441

github-actions bot added the python python script changes label Jul 4, 2025

younesbelkada mentioned this pull request Jul 4, 2025

MODEL: Falcon-H1 support #14238

Closed

younesbelkada and others added 2 commits July 9, 2025 00:46

Update src/llama-model.cpp

4d2c94b

Co-authored-by: Sigbjørn Skjæret <[email protected]>

falcon-h1: fix last comment

b7c9a99

CISC reviewed Jul 8, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

compilade requested changes Jul 8, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

younesbelkada and others added 2 commits July 9, 2025 08:20

Update convert_hf_to_gguf.py

9fd308d

Co-authored-by: compilade <[email protected]>

falcon-h1: revert add_add_bos(False)

51f50bf

compilade approved these changes Jul 9, 2025

View reviewed changes

CISC reviewed Jul 9, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

younesbelkada added 4 commits July 9, 2025 11:33

falcon-h1: fix tied weights

367d8c5

falcon-h1: remove whitespace

1fa361b

falcon-h1: fix wrong size param

6dde986

falcon-h1: fix whitespace issues

94ab3a8

CISC merged commit 0465506 into ggml-org:master Jul 9, 2025
1 check passed

CISC mentioned this pull request Jul 9, 2025

llama : remove unintended whitespace #14592

Merged

bi4key mentioned this pull request Jul 9, 2025

Falcon-H1 waiting for upgrade Vali-98/ChatterUI#399

Closed

gabe-l-hart reviewed Jul 9, 2025

View reviewed changes

gabe-l-hart mentioned this pull request Jul 9, 2025

llama : support Jamba hybrid Transformer-Mamba models #7531

Merged

8 tasks

compilade mentioned this pull request Jul 9, 2025

cuda : support Falcon-H1 state size for SSM_SCAN #14602

Merged

ggerganov added the hot Something that is hot label Jul 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama: add initial support for Falcon-H1 model family #14534

llama: add initial support for Falcon-H1 model family #14534

Uh oh!

ibrahimkhadraoui commented Jul 4, 2025 •

edited

Loading

Uh oh!

jacekpoplawski commented Jul 4, 2025

Uh oh!

ibrahimkhadraoui commented Jul 4, 2025

Uh oh!

Uh oh!

Uh oh!

younesbelkada commented Jul 9, 2025 •

edited

Loading

Uh oh!

compilade commented Jul 9, 2025

Uh oh!

younesbelkada commented Jul 9, 2025

Uh oh!

younesbelkada commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

CISC commented Jul 9, 2025

Uh oh!

CISC commented Jul 9, 2025

Uh oh!

younesbelkada commented Jul 9, 2025

Uh oh!

gabe-l-hart Jul 9, 2025

Uh oh!

Uh oh!

llama: add initial support for Falcon-H1 model family #14534

llama: add initial support for Falcon-H1 model family #14534

Uh oh!

Conversation

ibrahimkhadraoui commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacekpoplawski commented Jul 4, 2025

Uh oh!

ibrahimkhadraoui commented Jul 4, 2025

Uh oh!

Uh oh!

Uh oh!

younesbelkada commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

compilade commented Jul 9, 2025

Uh oh!

younesbelkada commented Jul 9, 2025

Uh oh!

younesbelkada commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

CISC commented Jul 9, 2025

Uh oh!

CISC commented Jul 9, 2025

Uh oh!

younesbelkada commented Jul 9, 2025

Uh oh!

gabe-l-hart Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ibrahimkhadraoui commented Jul 4, 2025 •

edited

Loading

younesbelkada commented Jul 9, 2025 •

edited

Loading